See copyright notice at the bottom of this page.
List of All Posters
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Discussion ThreadPosted 4:58 p.m.,
August 5, 2003
(#16) -
Erik Allen
In regards to correlation coefficients for 1Bbip vs. xbhbip - I don't think you have to hypothesize something about the inner workings of the game in order to get the results indicated.
Correlation coefficients for one year to the next should be dependent on the relative number of occurences of each type of event in a given year. Extra base hits occur much less frequently than singles, so we would expect that the relative variation in extra-base hits to be larger, year over year, and hence the correlation coefficient should be smaller.
Just a thought...I may be wrong on my theory
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 7:26 a.m.,
August 6, 2003
(#29) -
Erik Allen
Ross CW (#28)
While I agree with you that, in a strict sense, comparing correlation coefficients of two statistics from year-to-year is technically meaningless, I think in certain cases it can be useful. In McCracken's original DIPS work, he simply shows that there is simply MUCH less predictability in BABIP than in K/9, BB/9 etc. So much less, in fact, that sample size issues are probably not the sole cause. This discovery in itself is quite interesting, because it explains in some sense why it is difficult to predict pitcher ERA from one year to the next.
I think the larger problem is assigning a cause to this type of study. McCracken attributes the discrepancy in correlation coefficients to a COMPLETE lack of pitcher "control." Subsequent writing on this site and others has shown this to probably be false.
Summing up (wait, I actually had a point? :)) : I think that comparing correlation coefficients is a pretty rough test to use, and owing to sample size effects, and the old aphorism that "causation does not imply correlation," it is really difficult to show an effect exists and to attribute a reason to that effect.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 9:47 a.m.,
August 6, 2003
(#31) -
Erik Allen
Mr. Tiger (would this be the correct formal address?),
Let me just start by saying that I agree with everything you say up to the end of the official quote. You are absolutely right, IMO, to say that correlation coefficients can give us an indication of how predictive a given statistic will be for the next year. For many situations, this is all we really need, since we are simply trying to project next year's performance...my only objection was in trying to relate these correlation coefficients to physical realities of the game (i.e. attributing blame or credit to the hitter, pitcher, or fielder). I think that going down that road is very difficult to justify.
I am not sure if the last part of your message was directed towards my comments, and I am not sure if we are talking about the same thing (my fault probably...I am not the most eloquent writer). So, let me expand upon post 16:
Imagine we have Joe Pitcher. Joe (or his fielders, or whomever) have a skill set such that 20% of balls in play fall in for singles, and 10% fall in for extra base hits. I am not sure about these numbers, but they seem to be in the right ballpark. For simplicity, let's treat these as independent, binomial variables. That is, we assume for each trial (a ball hit in play), there is a 20% chance that it falls in for a single, and an 80% chance that it does not. Similarly, for each trial, there is a 10% chance that the ball will fall in for an xbh, and a 90% chance that it does not. This is clearly a huge oversimplification, but it can suffice for now.
Over the course of the season, Joe Pitcher gives up 500 BIP. The expected value of singles should be
1B = n*p = 500*0.2 = 100.
Similarly, xbh = 50.
The standard deviation is given by
STD = sqrt(n*p*(1-p))
1BSTD = 8.94
xbhSTD = 6.71
Therefore, the relative standard deviation (STD divided by the expected value) is
1BRSD = 8.94 / 100 = 0.0894 = 8.94%
xbhRSD = 6.71 / 50 = 0.134 = 13.4%
The purpose of this analysis is to show that we would expect more year-to-year variability in xbhBIP simply because of sample size differences. So, say pitcher A is slightly better than pitcher B at preventing both 1Bbip and xbhbip, and by the same amount. We would expect, based on the above analysis, that we would more frequently OBSERVE pitcher B to be superior at preventing xbhbip than we would 1bbip. This could possibly explain the discrepancy you find, and we don't necessarily need to invoke any baseball reasoning to explain the data.
If you happen to locate a statistic that displays a HIGHER year-to-year correlation, even with a smaller numerator (i.e. hitter triple rates), then this would seem to imply that the differences in player ability outweigh the variability of the statistic.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 11:21 a.m.,
August 6, 2003
(#34) -
Erik Allen
Hmmm...that is very interesting stuff.
I can't say that my hypothesis is fully supported by your data, but at least some of it is predictable...
First, let's agree on some notation, to make this easier...
Going back to my college stat book (brush off dust...) I see that a coorelation coefficient is defined as:
Corr = sum over i [(x_i-x_avg)*(y_i-y_avg)]
where x_i and y_i here would be the 1Bs allowed for two consecutive years. So, in your example, pitcher 1 gave up 203 and 201 singles in two consecutive years. So, x_1=203, y_1=201. x_avg and y_avg would be the true rates. x_avg=200, y_avg=100.
In your first simulation, all 20 pitchers should have the same ability. Therefore, if pitcherX were ABOVE average one year, we should not expect him to be ABOVE average the second year, and I would think that corr=0 for a sufficiently large sample. So either 1) 20 pitchers is too small a sample size or 2) I don't know what I am talking about. Give a 50/50 chance to both those possibilities. :)
The second case is closer to what I was imagining for a test...each pitcher has slightly different abilities. So, say I have 2 pitchers:
PITCHER A: 1B = 0.2, xbh = 0.1, out = 0.7
PITCHER B: 1B = 0.18, xbh = 0.9, out = 0.73
In both cases, PITCHER B is 10% better than pitcher A at preventing 1B and xbh. However, in the course of a given season, due to random variations, PITCHER A might show better "ability" than PITCHER B in both 1Bbip and xbhBIP. Further, I would expect A to beat B MORE OFTEN in xbhBIP due to the effects I described above. Since xbhBIP will be less indicative of their true talent levels, and more indicative of luck, we would expect the year-to-year correlation in xbhBIP to be lower than that of 1bBIP. We see that in your second test, where the tests go in order. However, as you say, 20 pitchers may not be enough to establish a trend.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 11:22 a.m.,
August 6, 2003
(#35) -
Erik Allen
One error in my above post... y_avg = 200 also
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 1:31 p.m.,
August 6, 2003
(#37) -
Erik Allen
Tango,
I was very intrigued with your results above, so I repeated the simulations that you performed, to see if our results matched.
First, I ran the case where each pitcher has the same ability: 1b=0.2,xbh=0.1,out=0.7. I used 1000 balls in play as you did, but increased the number of pitchers to 10,000. For this case, I get year-over-year r values of :
xbh = 0.0039
1b = -0.0080
So, essentially no correlation, which is what I was hoping for.
For the second study, I also used 10,000 pitchers. However, in this case each pitcher was assigned a random value of 1B and xbh. For 1B I gave a range of 0.18 to 0.22. For xbh I gave a range of 0.09 to 0.11. So, on a relative basis, these are the same ranges. The correlation coefficients here are:
1B = 0.46
xbh = 0.28
So, from here we can see that there is significantly less predictability in xbh rate, despite the fact that the relative variation in the statistics is approxiamtely the same.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 2:58 p.m.,
August 6, 2003
(#39) -
Erik Allen
Yeah, I hadn't noticed that above, but the numbers are eerily similar!
As to your previous post, I reran some numbers, and you are absolutely correct:
1B Range: 0.18 - 0.22: corr = 0.46
xbh range: 0.09 - 0.11: corr = 0.28
out range: 0.67 - 0.73: corr = 0.44
1B Range: 0.16 - 0.24: corr = 0.77
xbh range: 0.08 - 0.12: corr = 0.60
out range: 0.64 - 0.76: corr = 0.76
1B Range: 0.12 - 0.28: corr = 0.93
xbh range: 0.06 - 0.14: corr = 0.86
out range: 0.58 - 0.82: corr = 0.93
As you can see, as you increase the spread of ability, you increase the likelihood that the true ordering of abilities will prevail over the course of the season.
As you also predicted, the outs correlation was actually very close to the 1B correlation. However, I am not sure I am ready to cede this point. I say this because the %out probability is NOT independent of the other two probabilities. So, it is not technically a random variable. Does this affect things at all? I have no idea...oh, how I wish I was a statistician.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 8:31 a.m.,
August 7, 2003
(#46) -
Erik Allen
I agree with tango's post (#42) above. Let me restate it in my own words (to make sure we are on the same page), and then add a few thoughts of my own.
I think the basic lesson we can take from this discussion is: Year-to-year correlation coefficients depend on a lot of factors, including sample size, how often the event (e.g. hit) occurs, and the spread of talent. Therefore, a low correlation coefficient ON ITS OWN is not enough to say that a talent or persistence of ability does not exist. In fact, as the simple simulations I did above show, you can get a VERY low correlation coefficient even when a distinct talent is present.
In response to FJM: I agree that there may be constant changes in a pitcher's (or fielder's, or whomever) ability to prevent hits on balls in play. However, as you say, we can't really separate those changes in ability from the fluctuations in BABIP that are caused due to simple randomness. But this is really true of all baseball statistics. When a pitcher strikes out 12 batters in one game when he averages 7K/9IP, we don't know if this was a random variation, or if the pitcher was really "on" that day. The question is: Can we create a model in which pitcher ability is fixed, and have that model describe the observed variability in BABIP? If you can, then the source of variability is really irrelevant.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 11:43 a.m.,
August 7, 2003
(#49) -
Erik Allen
In response to RossCW (#47)
I apologize for the misunderstanding...I did not explain my point about the model very well. Here is what I meant:
Pitcher BABIP is variable...it varies from game to game, and year-to-year. As a poster above mentioned, this variability has some element of luck, or random variation. However, it might also be variable because a pitcher's skill at preventing BIP changes from game-to-game or from year-to-year. If BABIP "skill" changes from game-to-game, this we would probably call "streakiness." If BABIP "skill" varies from year-to-year, we could call this "development" or "aging." My point was simply that we can't really separate changes in skill from random variation. To simplify the universe, we simply have to start by assuming that a picher/fielder's skill level is fixed over the course of a multiyear period. If the variability that we see from year-to-year can be explained using this assumption, then we don't really need to worry about the possibility of streakiness.
I think that this is basically what Tango is saying in post 48 as well.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 11:52 a.m.,
August 7, 2003
(#50) -
Erik Allen
Tango,
I don't know what capabilities you have in terms of the database you are working from, but I was wondering if it would be possible to get data for pitcher seasons broken into groups by number of balls in play?
For example, could you give me a list of all pitcher seasons over the past 20 years where the pitchers had between 100 and 200 balls in play, and so on? I think we could start to understand how much variance there is in BABIP as you increase the number of balls in play.
For the idea I had, I would need the number of pitchers in each group, the average BABIP for the group, and the standard deviation on BABIP for the group.
This is pretty exciting. I think there are some real opportunities to make progress here.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 1:01 p.m.,
August 7, 2003
(#53) -
Erik Allen
Hmmm...I hadn't considered the possibility of selective sampling, but you make an excellent point. It should be interesting to see what happens to the spread.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 2:16 p.m.,
August 7, 2003
(#55) -
Erik Allen
Thanks, I had not seen that article.
Just to confirm that you see the same effect on a season-by-season basis, here are the results from the file you sent me:
#BIP BABIP
100-199 0.291
200-299 0.284
300-399 0.284
400-499 0.282
500-599 0.282
600-699 0.278
700-799 0.275
800-899 0.272
900-999 0.268
As you say, either selective sampling is at work, or there is a real difference in ability.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 6:09 p.m.,
August 7, 2003
(#58) -
Erik Allen
Well, I have some preliminary results, and they are promising (to me at least), although I don't know how much stock to put into them.
First a little background: As tango points out in his previous posts (56,57), in the PA apperance range of 200-800 BIP, the BABIP rate is very tight (0.275-0.284). I am therefore making the approximation that the pitchers that appear in this sample have a distribution of talent at preventing hits on balls in play. I am furthermore assuming that this distribution of talent is normally distributed (i.e. bell curve shaped) about 0.281.
Here are the statistics I get for all pitcher seasons between 200 BIP and 799 BIP (based on data provided by tango):
# of seasons: 4389
average BABIP: 0.281
Standard Deviation of BABIP: 0.027
The standard deviation is a measure of the spread of the data. Basically, we can say that 67% of all seasons should be between +/- 1SD of average (0.254 - 0.308) and 95% will be within 2SD. Nothing exciting here, this has all been done before.
The question that we cannot answer from the basic analysis above is: what is the standard deviation of "true" talent? For example, do all pitchers simply have the same talent level of 0.281 BABIP? Or, is there some spread to pitcher talent? What is the magnitude of this spread?
To answer these questions, we have to account for the number of trials (i.e. the number of balls in play). So, I have broken down the pitching seasons by balls in play into groups of 100. Listed below are the number of seasons in each group, and the standard deviation of the group:
#BIP #seasons STDEV
200-299 1446 0.032
300-399 812 0.0268
400-499 592 0.0245
500-599 507 0.0221
600-699 579 0.0210
700-799 454 0.0204
From the above table, you can see a clearly decreasing trend in the standard deviation of BABIP as you increase the number of BIP. And, intuitively, we can agree with this idea. After all, in a small number of trials, any number of fluky things can happen, including having a 0.400 BABIP or a 0.150 BABIP. As you increase the number of chances, the likelihood of a really fluky season decreases.
We now have standard deviations of the OBSERVED data broken down by number of balls in play. However, we also know that this observed standard deviation is not equal to the standard deviation of the TRUE talent level. For example, if all pitchers have the same inherent skill level (BABIP=0.281) the stdev of the true distribution is 0. The observed stdev will be something greater than zero.
To figure out what the true standard deviation is that matches the data, we can run a simulation. In this simulation, I set the true standard deviation of the group of pitchers, and measure the output observed standard deviation. Then, I tinker with the set value of the true standard deviation until the output standard deviation I obtain is equal to the observed standard deviation of the group. I can do this for the various numbers of balls in play.
This is already getting long, and I am probably rambling incoherently, so let me simply get to the data. The table below lists the number of balls in play range, and the TRUE standard deviation that would lead to the OBSERVED standard deviation given in tango's data.
#BIP TRUE STDEV
200-299 0.014
300-399 0.014
400-499 0.012
500-599 0.012
600-699 0.012
700-799 0.012
I was ecstatic, to say the least. What I see above is a remarkably consistent picture of pitcher ability. It seems that, as a rough estimate, we can say that pitcher abilities are normally distributed about BABIP 0.281 with a standard deviation of 0.012 or so.
Roughly 2/3 of all pitchers should have a TRUE BABIP rate of 0.269-0.293. Roughly 95% of pitchers should have a true BABIP rate of 0.257-0.305. If this stands up, it is useful because it means that when a pitcher has a season with a 0.250BABIP, we can hypothetically give an estimate of his TRUE BABIP rate.
There are a ton of holes that can be poked in this, being as rough a calculation as it is, and I welcome any and all comments.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 6:44 p.m.,
August 7, 2003
(#60) -
Erik Allen
Tango,
You mention that the stdev of 0.012 is twice what you would have expected. Can you explain what that expecation was based on? Perhaps I am missing something in my analysis.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 9:17 p.m.,
August 7, 2003
(#63) -
Erik Allen
D'oh! Defense and park factors are always getting in the way of beautifully simple theories.
Tango, your comments in 61 make me realize that I was getting ahead of myself. All the simulation above tells us is that we can match the observed experimental distribution if pitcher BABIP rates are normally distributed with a stdev of 0.012. We have not yet made any claims as to why this distribution of true BABIP rates exists...is it due to the pitcher, the defense, or the park? This is obviously a key question in predicting abilities going forward.
However, I am not sure that I agree with the modifications you mention...I think one of the ideas that you have put forward very clearly during this debate is that we need to reconsider how important the defense and pitcher contributions are to BABIP. If we assign 0.08 to the defense and the rest to the pitching, it seems to me that we have just introduced our own biases into the equation.
I wonder if you have given any more thought to the question of comparing a theoretical correlation coefficient to an experimental coefficient, as a basis of predicting control?
By the way, thank you so much for making that data file...it was amazing how quickly you were able to generate it!
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 1:44 p.m.,
August 8, 2003
(#66) -
Erik Allen
Okay, I see where you are going now.
I have a few questions/comments, all of which can be dealt with (I think).
1) Both the park effect and the defensive deviations you mention are the observed standard deviations, correct? If so, I would think that the measured stdev is larger than the true stdev as in the general case. I think we can account for this somehow.
2) I just read the UZR Primer by Mitchel Lichtman. However, he focuses on individual performance, not team performance. Do you have a good article relating to team UZR?
3) It appears that UZR ignores certain outcomes (pop flys?) which would not give credit to pitchers who were able to induce lots of pop flys. I am worried this (or other effects) might give more credit to the defense than is due.
4) Has anyone done a comparison of year-to-year correlation of pitchers who remain with the same team, versus pitchers who change teams? This seems like it might provide some insight into how much control a pitcher has.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 11:20 a.m.,
August 9, 2003
(#68) -
Erik Allen(e-mail)
Okay, I have run a few simulations to try and tease out some of the park, defense, and pitching dependence. I will break this info into different posts, to avoid excessive length.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 11:42 a.m.,
August 9, 2003
(#69) -
Erik Allen
Before getting into more complicated simulations, I thought it would be appropriate to first look at some "extreme" cases.
The first question one might ask is: Is the pitcher ENTIRELY responsible? The answer is almost certainly no, but it might be instructive to see what kind of results you would expect if such was the case.
What I did in this set of simulations then, was to randomly assign 10,000 pitchers a BABIP skill level (for example, 0.281, or 0.250, etc.). These skill levels are normally distributed about 0.281 with a standard deviation of 0.012 (as found previously to fit the data). We assume that this BABIP level is ALWAYS their true level. Then, I simulate 2 separate seasons, and record and OBSERVED BABIP level for each pitcher each season (this would correspond their true major league performance). I then measure the correlation coefficient for the year-over-year data, and compare it to the correlation coefficient tango found in his study (0.15, see post 6).
Okay, wordy, I know, so let's get to the data: In the data file tango sent me, there were 4389 pitchers that had between 200 and 800 BIP in a given season. The average number of BIP was around 430. Therefore, I let each pitcher have 430 BIP in each season.
#BIP; 430 for both seasons, for all pitchers. r = 0.24
As we would expect, the correlation coefficient is too large. One modification we could make to change the outcome slightly, would be to assign different pitchers different numbers of plate apperances, to more closely reflect reality. When I do this (e-mail me if you want more methodology), I get r = 0.21. Still too large.
The "Well, duh!" conclusion, is that pitchers BABIP talent does not lie solely with the pitcher.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 11:55 a.m.,
August 9, 2003
(#70) -
Erik Allen
(homepage)
A second extreme case would be to ask if the data can be explained solely on the basis of park factors. That is, the pitching has no influence, the defense has no influence, only the effect of the park determines the BABIP rate.
Tango has a list of BABIP park factors at his website (see homepage link). You divide these factors by 2 to get a team's park effect over the course of a season. The standard deviation of this distribution is 0.004.
In 2002, the average team had around 4550 BIP over the course of a season. Using the same methodology as above, I assign each team a BABIP level based on a normal distribution with standard deviation 0.004.
The correlation coefficient for this case is r = 0.25.
The year-to-year correlation coefficient on a team level is more like r = 0.6 So, clearly, the park is not the only factor either.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 12:15 p.m.,
August 9, 2003
(#71) -
Erik Allen(e-mail)
(homepage)
In Tippett's article on BABIP (see homepage link), he finds that the correlation coefficient for pitchers, _relative to their team_, is 0.09. Now, I am starting to get on shaky ground here, but if I assume that all pitchers are affected in the same way by a given defense or a given park (not entirely true obviously), then we can view BABIP relative to the team as a measure of pitcher ability.
To do the simulation, I assume that pitcher ability, relative to their team enviroment, is normally distributed. I try different standard deviations of this distribution, and measure the resulting correlation coefficient:
stdev_talent r
0.006 0.061
0.007 0.082
0.008 0.11
0.010 0.16
0.012 0.21
Based on the chart above, it appears that a standard deviation of talent of around 0.007 to 0.008 would be appropriate.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 12:26 p.m.,
August 9, 2003
(#72) -
Erik Allen(e-mail)
I wanted to write one more post concerning some ideas for future reasearch, most of which I would not know how to do:
1. One study that could potentially confirm this work would be to study the year-to-year correlation of pitchers who had between 200-800 BIP and remained on the same team both years. For this group of pitchers, one can conceivably imagine that the talent level of (pitcher+defense+park) would be fairly consistent year-over-year. Would you get a correlation coefficient closer to 0.21 for this group (as in post 69)?
2. Look closer at UZR to determine what an appropriate defensive stdev is (this I think I can do, after some more thought).
3. What year-over-year r do you get for pitchers that have changed teams (not sure if there is enough data for this.) Tango mentioned that Tippett addressed this, but I was not able to find it mentioned.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 12:46 p.m.,
August 9, 2003
(#74) -
Erik Allen
Patriot, I think you are correct is your assessment. In retrospect, post 71 hasn't really proved much, because you still have to make some key assumptions reagarding the interaction of pitching and defense.
I don't really have a ton of good ideas beyond it, however.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 10:15 a.m.,
August 12, 2003
(#81) -
Erik Allen
Sorry for the delay since my last post...I had to do some "real" work.
I ran the simulations that tango suggested. That is, I introduced a random, normally distributed defensive factor for each player. Tango sets the standard deviation of the defensive contribution at 0.008. However, since I didn't know the exact basis for this number, I ran the simulation under 2 assumptions:
Case 1:
Assumptions:
1. 0.008 is the _observed_ stdev of defensive talent, AFTER ACCOUNTING FOR PARK EFFECTS.
2. The talent of a defense is independent of the park they play in (i.e. the park effect and the defensive ability of the team are independent variables).
In Case 1, we need to determine the true standard deviation of defensive ability, since the observed standard deviation is larger than the true standard deviation. To do so, I ran my simulation at different levels of true standard deviation, and measured the output stdev (each team was given 4550 BIP). I get a true standard deviation of 0.0045.
After doing this, I can compute the true standard deviation of pitcher ability. I use the same tactic as in previous posts, changing the true stdev to match the observed stdev for different levels of BIP. The stdev of pitchers in Case 1 is 0.010. Table 1 presents some data I get from such an analysis:
#BIP Simulation_stdev Real_life_stdev
250 0.0308 0.0321
350 0.0266 0.0269
450 0.0241 0.0245
550 0.0225 0.0221
650 0.0211 0.0210
750 0.0202 0.0204
As you can see, the simulations stdev matches the "real life" stdev for every case expect for the pitchers in the 250 BIP range. This is a problem I have been having fairly consistently. I think it could be explained by a number of factors:
1. 100 BIP is too wide a range to use at such a low number of BIP
2. The range of talent for pitchers in this group is larger
Anyway, for case 1, we see the pitching/defense/park breakdown is 0.010/0.0045/0.004
P.S. I also calculated a correlation coefficient as described previously (pitchers assigned 200-800 BIPs according to the major league distribution). I get r = 0.20. Still too high (as expected)
Case 2:
Assumptions
1. 0.008 is the true distribution of defensive talent
2. Defensive talent is independen of park
The only difference from case 1 is that I now use 0.008 for the stdev of defensive talent. Using the same procedures as above, I get a pitcher stdev of 0.007. See table for simulation details:
#BIP Simulation_stdev Real_life_stdev
250 0.0306 0.0321
350 0.0266 0.0269
450 0.0240 0.0245
550 0.0222 0.0221
650 0.0209 0.0210
750 0.0199 0.0204
Case 2, the pitching/fielding/park breakdown is 0.007/0.008/0.004
P.S. I also calculated a correlation coefficient as described previously (pitchers assigned 200-800 BIPs according to the major league distribution). I get r = 0.19. Still too high (as expected)
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 11:03 a.m.,
August 12, 2003
(#82) -
Erik Allen
Actually, one modification to the above numbers...
For case 2, 0.007 appears to be a bit low for the pitcher distribution estimate. 0.0075 appears a little better, and 0.008 might work also...the estimates are not perfect.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 10:42 p.m.,
August 14, 2003
(#106) -
Erik Allen
Tango,
Thanks for all the data. I will try and run some simulations, and maybe have results this weekend or Monday.
I was wondering (if this is easy to compute) what r value you calculate for year-to-year correlation of pitchers that did not change teams? Is that information available, or too tough to get at?
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 8:39 a.m.,
August 15, 2003
(#107) -
Erik Allen
Tango, questions from post #104:
1. The standard deviations you are providing here are for UZR, correct?
2. In post 65, you list the overall team standard deviation at 0.010, whereas here you list it at 0.008. Did you get different results the second time, or did you write 0.008 because this was already the value agreed upon?
3. How many opportunities do teams typically get for ground balls and flyballs? Overall, the total BIP is around 4550, but what is the IF/OF or GB/FB breakdown? Ideally, a distribution would be best (i.e. 10 teams had 2000-2100 FB, 5 teams had 2100-2200 FB, etc.) but just average numbers would be okay for a first pass.
4. Same question as number three except broken down by position.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 8:47 a.m.,
August 15, 2003
(#108) -
Erik Allen
To Arvin and Chris R:
Sorry for the delayed response, but I am not a statistician, so it took me a while to understand what you are saying. :)
In response to Chris R, post 86: You sum up my methodology. Your summary is essentially correct.
In response to Arvin, post 92: Not entirely sure I understand what you mean, but essentially, I think you are saying that the observed variance is not the same as the population variance. I agree with this totally, and is one of the main things I am taking into account in my simulation. However, if you have a way to calculate it analytically, all the better.
To Chris R, post 93: I agree that defenses affect pitchers differently, and this needs to be the next step. This ultimately means I cannot select a defense and a pitcher ability randomly. However, I disagree that the true variance of the defense depends on the number of opportunities. The true (population) variance) is independent of number of BIP, but the observed variance will depend on sample size.
Thank you both for your insightful comments, and once again, sorry that I did not respond sooner.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 8:51 a.m.,
August 15, 2003
(#109) -
Erik Allen
Tango, one more question:
5. Are the standard deviations you provide already park-adjusted? This probably won't make a significant difference, but just asking.
Solving DIPS (August 20, 2003)
Posted 11:53 a.m.,
August 20, 2003
(#1) -
Erik Allen(e-mail)
Really nice presentation Tango. I would say that you hit all the high points of the discussion.
I have held off on doing more simulations to this point, since it appears you can also do this work analytically, but I can run some situations if you feel there is a need.
I have not yet said thanks to Arvin as well, for all his work. Thanks, Arvin!
Professor who developed one of computer models for BCS speaks (December 11, 2003)
Posted 2:33 p.m.,
December 11, 2003
(#4) -
Erik Allen
Amen, brother. When the computers disagree with the ESPN polls, the talking head response is to say that the computers are flawed since they cannot replicate the human poll process. I tend to turn the tables: what does it say about human fallibility that we can't have press poll that reproduces the results of a logical computer ranking system?
Incidentally, I think it would be quite easy to write a computer ranking system that replicated the coaches' and writers' polls: a team's ranking would be heavily weighted by its previous week's ranking, with an automatic drop in ranking for any losses. However, stating the "logic" of such a rating system displays what a ridiculous notion it is.
EA
Best Fielding Teams, 2003 (December 28, 2003)
Posted 4:22 p.m.,
December 30, 2003
(#2) -
Erik Allen(e-mail)
Tango (or anyone else in the know)...
One thing I have always been confused about concerning UZR and other such ranking systems is the use or non-use of the speed of the batted ball. Does UZR or Pinto's model account for this?
It seems to me that this is an essential part of evaluating the role of the fielder versus the role of the pitcher, although I recognize how difficult it must be to obtain data like that. If there is some inherent BABIP ability for pitchers, it seems to me that it would almost assuredly be linked to a pitcher's unique distribution of ball velocities as they leave the bat.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 9:12 a.m.,
February 3, 2004
(#9) -
Erik Allen(e-mail)
Very interesting study! I think the research method used here is sound, and is very similar to what was used in the Solving DIPS thread this summer.
A few comments...first, I think the fact that better hitters tend to hit worse in the clutch is entirely consistent with the idea that there are better pitchers pitching in these situations. To illustrate this, we can estimate the outcome of a particular batter/pitcher matchup using Bill James Log5 formula. Consider the extreme example of brining in a relief ace pitcher, whose OBP against is 0.300 versus a league average of 0.340. The first batter to come to the plate has a true OBP of 0.400. The Log5 formula predicts the hitter will get on base at about a 0.356 clip in this situation. In other words, the batter OBP is reduced by 0.044. Next, a poor hitter with a 0.300 OBP comes up. Log5 predicts the OBP of this matchup is 0.263, only a drop of 0.037 in OBP. Better hitters should be more strongly affected by the presence of a good pitcher than poor hitters.
A second comment is an idea for a follow-up study. Although you have demonstrated that a spread of clutch hitting ability exists, you can also use the MC simulation to estimate the persistence of clutch hitting ability. To do this, create a set of imaginary players with the distribution of clutch hitting abilities found in your study. Then, simulate two consecutive seasons for each of these players, and measure the y-t-y correlation. If this correlation coefficient is significantly higher than 0.04, then it raises some questions about the predictability of clutch hitting ability.
ARod and Soriano - Was the Trade Fair? (February 16, 2004)
Posted 2:11 p.m.,
February 17, 2004
(#20) -
Erik Allen
I think one very important factor that is neglected in the "x wins is worth y dollars" analysis is that teams cannot be value investors in the sense that one could be a value investor in the stock market. In the stock market, an individual investor has, for practical purposes, an infinite number of possible investment options. They can invest only in the best deals, and still find places for all their money (Buffett excluded). A baseball team is constrained to only 25 "investments" (i.e. players), of which only ~15 can play a substantial role in the success of the team. This constraint means that marquee players should command salaries in excess of that predicted only by a "Wins*$/Win" type analysis.
I seem to remember a study a few months ago on WS/$ versus $ and finding a negative slope. I would argue that this type of phenomenon is not a market inefficiency, but rather exactly what you expect given the value of marquee players.
ARod and Soriano - Was the Trade Fair? (February 16, 2004)
Posted 8:21 p.m.,
February 17, 2004
(#26) -
Erik Allen
Studes, thanks for the response! Actually, your response reminded me that it was you who did the WS/$ study.
My point was essentially this...according to the research you did, the best values in the major leagues are cheap players. Their Win Shares per dollar spent are higher than the typical superstar, correct? So, let's say a team could have as many players as they want, and they could all contribute to the team success. The best team money could buy (in terms of win shares per dollar) is a team a barely above replacement level players.
In real life, teams are limited to 15 contributors. The strategy of picking up 15 slightly above replacement level players would be terrible. I suppose another way of saying it is that we can't talk about a players value without talking about the "opportunity cost" of that roster spot. Sure, a cheap player is the best value player, but every cheap player we have picked up has used up a roster spot.
So, what are the ramifications of this analysis. Well, say you have your choice of A-Rod versus two players with exactly half the value of A-Rod. The strict "$x buys y wins" analysis says that both of these options are identically good. When considering the opportunity cost of the roster spot, however, A-Rod should be worth more, since that roster spot is available to pick up something of value.